Two methods for simple text type detection by Python [based on file header and cchardet library]

  • 2020-05-12 02:49:09
  • OfStack

This article illustrates a simple method for Python to detect text types. I will share it with you for your reference as follows:

1, according to the file header.


# Whether to take BOM The head of the UTF8 file 
def IsUtf8BomFile(pathfile):
  if b'\xef\xbb\xbf' == open(pathfile, mode='rb').read(3)):
    return True
  return False

2. Use the cchardet library.


>>> import cchardet
>>> cchardet.detect(open(pathfile, 'rb').read())
{'encoding': 'UTF-8', 'confidence': 0.9900000095367432}

More about Python related topics: interested readers to view this site "Python file and directory skills summary", "Python skills summary text file", "Python URL skills summary", "Python pictures skills summary", "Python data structure and algorithm tutorial", "Python Socket programming skills summary", "Python function using skills summary", "Python string skills summary" and "Python introductory and advanced tutorial"

I hope this article has been helpful to you in Python programming.


Related articles: